{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "52824b89-532a-4e54-87e9-1410813cd39e",
   "metadata": {},
   "source": [
    "# LangChain: Q&A over Documents\n",
    "\n",
    "An example might be a tool that would allow you to query a product catalog for items of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10c1f7b9",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "#pip install --upgrade langchain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b7ed03ed-1322-49e3-b2a2-33e94fb592ef",
   "metadata": {
    "height": 81,
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv()) # read local .env file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ead635a0-42e2-46cc-a9f7-98419eceae6d",
   "metadata": {},
   "source": [
    "Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cc533037-0b8c-4995-96a3-45b35fa13c18",
   "metadata": {
    "height": 234
   },
   "outputs": [],
   "source": [
    "# account for deprecation of LLM model\n",
    "import datetime\n",
    "# Get the current date\n",
    "current_date = datetime.datetime.now().date()\n",
    "\n",
    "# Define the date after which the model should be set to \"gpt-3.5-turbo\"\n",
    "target_date = datetime.date(2024, 6, 12)\n",
    "\n",
    "# Set the model variable based on the current date\n",
    "if current_date > target_date:\n",
    "    llm_model = \"gpt-3.5-turbo\"\n",
    "else:\n",
    "    llm_model = \"gpt-3.5-turbo-0301\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "974acf8e-8f88-42de-88f8-40a82cb58e8b",
   "metadata": {
    "height": 115
   },
   "outputs": [],
   "source": [
    "from langchain.chains import RetrievalQA\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.document_loaders import CSVLoader\n",
    "from langchain.vectorstores import DocArrayInMemorySearch\n",
    "from IPython.display import display, Markdown\n",
    "from langchain.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7249846e",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "file = 'OutdoorClothingCatalog_1000.csv'\n",
    "loader = CSVLoader(file_path=file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5bfaba30",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from langchain.indexes import VectorstoreIndexCreator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9b5ab657",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "#pip install docarray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9e200726",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "index = VectorstoreIndexCreator(\n",
    "    vectorstore_cls=DocArrayInMemorySearch\n",
    ").from_loaders([loader])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "34562d81",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "query =\"Please list all your shirts with sun protection \\\n",
    "in a table in markdown and summarize each one.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9a4c615-d6e4-4dd6-bc53-a9c46df7276c",
   "metadata": {},
   "source": [
    "**Note**:\n",
    "- The notebook uses `langchain==0.0.179` and `openai==0.27.7`\n",
    "- For these library versions, `VectorstoreIndexCreator` uses `text-davinci-003` as the base model, which has been deprecated since 1 January 2024.\n",
    "- The replacement model, `gpt-3.5-turbo-instruct` will be used instead for the `query`.\n",
    "- The `response` format might be different than the video because of this replacement model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "cfd0cc37",
   "metadata": {
    "height": 115
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"\\n\\n| Name | Description | Sun Protection Rating |\\n| --- | --- | --- |\\n| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, UPF 50+ rating, wrinkle-resistant, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\\n| Men's Plaid Tropic Shirt, Short-Sleeve | Made of 52% polyester and 48% nylon, UPF 50+ rating, SunSmart technology, wrinkle-free, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\\n| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% nylon and 29% polyester, UPF 50+ rating, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\\n| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rating, moisture-wicking, abrasion-resistant, fits over swimsuit | SPF 50+, blocks 98% of harmful UV rays |\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm_replacement_model = OpenAI(temperature=0, \n",
    "                               model='gpt-3.5-turbo-instruct')\n",
    "\n",
    "response = index.query(query, \n",
    "                       llm = llm_replacement_model)\n",
    "response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ae21f1ff",
   "metadata": {
    "height": 30,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "\n",
       "| Name | Description | Sun Protection Rating |\n",
       "| --- | --- | --- |\n",
       "| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, UPF 50+ rating, wrinkle-resistant, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\n",
       "| Men's Plaid Tropic Shirt, Short-Sleeve | Made of 52% polyester and 48% nylon, UPF 50+ rating, SunSmart technology, wrinkle-free, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\n",
       "| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% nylon and 29% polyester, UPF 50+ rating, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |\n",
       "| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rating, moisture-wicking, abrasion-resistant, fits over swimsuit | SPF 50+, blocks 98% of harmful UV rays |"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2534597e-4b0c-4563-a208-e2dd91064438",
   "metadata": {},
   "source": [
    "## Step By Step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "631396c6",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import CSVLoader\n",
    "loader = CSVLoader(file_path=file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6c2164b5",
   "metadata": {
    "height": 47
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1000"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs = loader.load()\n",
    "len(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4a977f44",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=\": 0\\nname: Women's Campside Oxfords\\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \\n\\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \\n\\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \\n\\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \\n\\nQuestions? Please contact us for any inquiries.\", metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0})"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "e875693a",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "779bec75",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "embed = embeddings.embed_query(\"Hi my name is Harrison\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "699aaaf9",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1536\n"
     ]
    }
   ],
   "source": [
    "print(len(embed))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9d00d346",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-0.02199048176407814, 0.006746508646756411, -0.018174780532717705, -0.03918623551726341, -0.01404528971761465]\n"
     ]
    }
   ],
   "source": [
    "print(embed[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "27ad0bb0",
   "metadata": {
    "height": 81
   },
   "outputs": [],
   "source": [
    "db = DocArrayInMemorySearch.from_documents(\n",
    "    docs, \n",
    "    embeddings\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "0329bfd5",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "query = \"Please suggest a shirt with sunblocking\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7909c6b7",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "docs = db.similarity_search(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "43321853",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6eba90b5",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=': 255\\nname: Sun Shield Shirt by\\ndescription: \"Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \\n\\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\\n\\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\\n\\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\\n\\nSun Protection That Won\\'t Wear Off\\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255})"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "c0c3596e",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "retriever = db.as_retriever()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "0625f5e8",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "llm = ChatOpenAI(temperature = 0.0, model=llm_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "a573f58a",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "qdocs = \"\".join([docs[i].page_content for i in range(len(docs))])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "14682d95",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "response = llm.call_as_llm(f\"{qdocs} Question: Please list all your \\\n",
    "shirts with sun protection in a table in markdown and summarize each one.\") \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "8bba545b",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "| Name                           | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "32c94d22",
   "metadata": {
    "height": 115
   },
   "outputs": [],
   "source": [
    "qa_stuff = RetrievalQA.from_chain_type(\n",
    "    llm=llm, \n",
    "    chain_type=\"stuff\", \n",
    "    retriever=retriever, \n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "e4769316",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "query =  \"Please list all your shirts with sun protection in a table \\\n",
    "in markdown and summarize each one.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "1fc3c2f3",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "response = qa_stuff.run(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "fba1a5db",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "| Shirt Name                           | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "500ec062",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "response = index.query(query, llm=llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cffb19f",
   "metadata": {
    "height": 81
   },
   "outputs": [],
   "source": [
    "index = VectorstoreIndexCreator(\n",
    "    vectorstore_cls=DocArrayInMemorySearch,\n",
    "    embedding=embeddings,\n",
    ").from_loaders([loader])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b325340f-26b4-4c7e-81da-da4b895ae058",
   "metadata": {},
   "source": [
    "Reminder: Download your notebook to you local computer to save your work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9b58916",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d590b337",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2cb587c",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ec249f1",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d64f166",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21322e7e",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
